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DETAILED ACTION 
Response to Amendment 

1 . It is hereby acknowledged that the following papers have been received and 
placed on record in the file: Amendment as received on 2-01-05. 

2. Claims 1-44, have been examined. 
Status of claims: 

3. Claims 1-4, 18, 20-27, 30-39, and 42-44 are rejected under 35 U.S.C. 102(e) as 
being anticipated by Hirai et al., Patent # 6,526,215, hereinafter Hirai. 

4. Claims 28 and 29 are rejected under 35 U.S.C. 102(e) as being anticipated by 
Ratakonda, Patent # 5,956,026. 

5. Claims 19, 40, and 41 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Hirai and Ratakonda. 

6. Claims 5-1 7 have been canceled by the applicant. 

Claim Rejections - 35 USC § 102 

7. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(e)the invention was described in (1) an application for patent, published under section 122(b), by 
another filed in the United States before the invention by the applicant for patent or (2) a patent 
granted on an application for patent by another filed in the United States before the invention by the 
applicant for patent, except that an international application filed under the treaty defined in section 
351(a) shall have the effects for purposes of this subsection of an application filed in the United States 
only if the international application designated the United States and was published under Article 21(2) 
of such treaty in the English language. 

8. Claims 1-4, 18, 20-27, 30-39, and 42-44 are rejected under 35 U.S.C. 102(e) as 
being anticipated by Hirai et al., Patent # 6,526,215, hereinafter Hirai. 
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9. With regard to claim 1 , which teaches a method of generating key frames 
comprising the steps of: receiving a video stream and dividing it into a plurality of 
sections each section including a plurality of frames, Hirai teaches, in column 4, line 45- 
64 and column 2, lines 43-53, an apparatus for receiving moving picture data, dividing it 
into scenes, further dividing it up into cuts. With regard to claim 1 , further teaching 
selecting a key region from each of the plurality of sections, and combining the selected 
key region from each of the plurality of sections to form a synthetic key frame, each 
selected one of the key frame and the key region corresponding to a portion of a frame 
smaller than the total frame size, Hirai teaches, in column 4, lines 52-55, column 2, lines 
42-60 and column 10, lines 40-45, the extracting of still images (M-icons) representing 
each of the scenes an cuts still images, these still images being representatives (key 
frames) of these subdivisions. Hirai further teaches, in column 15, lines 11-47, the 
creation of a M-icon by extracting information from an image and creating the M-icon 
based on an abstract of the obtained detected information. It is further clearly shown 
that if a video sequence is divided into parts that the parts are smaller than the total. 
Figure 1, further shows the hierarchical structure where the levels of M-icons are 
depicted each level going down representing a finer level of the video space. 

10. With regard to claim 2, which teaches the dividing step including receiving video 
form a second source, Hirai teaches, in column 3, line 8 and in figure 5, receiving input 
from sources such as a movie, and a still image. 



Application/Control Number: 09/800,999 t Page 4 

Art Unit: 2173 

1 1 . With regard to claim 3, which teaches the selecting step of including a key region 
output from a second source, Hirai teaches, in column 3, line 8 and in figure 5, receiving 
input from sources such as a movie, and a still image. 

12. With regard to claim 4, which teaches that a section is a unit of a segment, Hirai 
teaches, in column 4 line 45, that a scene (section) comprises a plurality of cuts 
(segments). 

13. With regard to claim 18, which teaches a hierarchical video summary method 
comprising means of, dividing a video stream into a plurality of sections where each 
section includes a plurality of frames, Hirai teaches, in column 4, line 45-64 and column 
2, lines 43-53, an apparatus for receiving moving picture data, dividing it into scenes, 
further dividing it up into cuts. With regard to claim 18, further teaching synthesizing a 
key region of each section into one image, to generate a synthetic key frame, wherein 
each key region corresponds to a portion of a frame smaller than the total frame size, 
Hirai teaches, in column 4, lines 52-55, column 2, lines 42-60 and column 10, lines 40- 
45, the extracting of still images (M-icons) representing each of the scenes an cuts still 
images, these still images being representatives (key frames) of these subdivisions. 
Hirai further teaches, in column 15, lines 11-47, the creation of a M-icon by extracting 
information from an image and creating the M-icon based on an abstract of the obtained 
detected information. It is further clearly shown that if a video sequence is divided into 
parts that the parts are smaller than the total. With regard to claim 18, further teaching 
assigning the synthetic key frames to a key image locator, a hierarchical summary list 
for describing lower summary structures, and structural information, Hirai further 
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teaches, in column 11, lines 9-15, information being contained in the elements of a 
hierarchical structure giving address information and video structural information. 

14. With regard to claim 20, which teaches that each hierarchical summary structure 
is represented by an image representative of a specific segment, Hirai teaches, in 
column 1 6, line 1 and in figure 1 , how in the hierarchy each M-icon (key frame) has its 
own information zone. 

1 5. With regard to claim 21 , which teaches that each component of the lower 
hierarchical summary list uses a hierarchical/recursive summary structure as a lower 
hierarchical summary structure, Hirai teaches, in column 9, line 7 and in figure 1, how 
the hierarchy is organized from the top level story to the next level scenes to the next 
level of cuts where a cut is a subset of a scene, and each of these M-icons (key frames) 
has a summary element. 

16. With regard to claim 22, which teaches that the hierarchical summary structure 
has summary level information, Hirai teaches, in column 10, lines 56-67, that each icon 
is given a layer level value (1,2,... from the bottom layer). 

17. With regard to claim 23, which teaches the hierarchical summary structure 
including a fidelity value, Hirai teaches, in column 5, line 6, and a level of degrees of 
abstraction as associated with frame images. ' 

18. With regard to claim 24, which teaches a method for providing a video browsing 
interface comprising: dividing a video stream into a plurality of sections, and 
synthesizing a key region representing content of each section into one image, to 
generate a synthetic key frame, wherein each key region represents important 
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information regarding the respective frame, Hirai teaches, in column 4, line 45-64 and 
column 2, lines 43-53, an apparatus for receiving moving picture data, dividing it into 
scenes, further dividing it up into cuts. Hirai teaches, in column 4, lines 52-55, column 
2, lines 42-60 and column 10, lines 40-45, the extracting of still images (M-icons) 
representing each of the scenes an cuts still images, these still images being 
representatives (key frames) of these subdivisions. Hirai further teaches, in column 1 5, 
lines 1 1-47, the creation of a M-icon by extracting information from an image and 
creating the M-icon based on an abstract of the obtained detected information. With 
regard to claim 24, further teaching providing a user interface to a predetermined 
display to browse a video related to the generated synthetic key frame, Hirai also 
teaches in column 4, line 60, a means for displaying said hierarchical structure and 
related information to the user. 

19. With regard to claim 25, which teaches the user interface providing the synthetic 
key frame in the form of view, Hirai teaches in figure 1 , a visual representation of the M- 
icons (key frames). 

20. With regard to claim 26, which teaches key frames being arranged in a time 
sequence, and the key frames arranged in a tree shape, Hirai teaches, in column 9, 
lines 35-51 and in conjunction with figures 8 and 10, how all of the M-icons are arranged 
in the order in which they occurred in the inputted moving picture, and that they are 
displayed in a hierarchical tree structure. 

21 . With regard to claim 27, which teaches key frames assigned to each node in a 
TOC form, Hirai teaches, in column 9, lines 25-34 and in conjunction with figures 8 and 
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10, managing information in an organized form, and then displays a TOC (figure for the 
hierarchy in figure 10. 

22. With regard to claims 30, 32, 34, 36, and 38, which teach the synthetic key frame 
including a selected key region from each of the plurality of sections, Hirai teaches, in 
column 4, lines 45-64, a group of frames combined into a representative frame. 

23. With regard to claims 31 , 33, 35, 37, and 39, which teach each of the plurality of 
sections comprising a video frame, and the selected key region comprises a portion of 
the video frame, Hirai teaches, in column 4, lines 45-64, a group of frames (video 
sequence or clip) combined into a representative frame and in column 3, lines 1-4, the 
association between the pictures in the hierarchal structure and their associated moving 
picture sequences. 

24. With regard to claim 42, which teaches a hierarchical summary information 
structure to be used for summarizing a source multimedia content comprising: a plurality 
of hierarchical summary element information structures, Hirai teaches, in column 4, line 
45-64 and column 2, lines 43-53, an apparatus for receiving moving picture data, 
dividing it into scenes, further dividing it up into cuts. Figure 1 , further shows the 
hierarchical structure where the levels of M-icons are depicted each level going down 
representing a finer level of the video space. With regard to claims 42, further teaching 
the hierarchical summary element information structure including: a key image locator, 
Hirai teaches, in column 4, lines 52-55, a selecting means for extracting a still image 
representing each of the scenes or cuts. Hirai further teaches, in column 1 1 , lines 9-1 5, 
information being contained in the elements of a hierarchical structure giving address 
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information and video structural information. With regard to claims 42, further teaching 
the hierarchical summary element information structure including: a list of sub 
hierarchical summary element information structures, With regard to claim 42, further 
teaching synthesizing a key region of each section into one image, to generate a 
synthetic key frame, wherein each key region corresponds to a portion of a frame 
smaller than the total frame size, Hirai teaches, in column 4, lines 52-55, column 2, lines 
42-60 and column 10, lines 40-45, the extracting of still images (M-icons) representing 
each of the scenes an cuts still images, these still images being representatives (key 
frames) of these subdivisions. With regard to claims 42, further teaching the 
hierarchical summary element information structure including: a summary level, Hirai 
teaches, in column 10, lines 56-67, that each icon is given a layer level value (1,2,... 
from the bottom layer). With regard to claims 42, further teaching the hierarchical 
summary element information structure including: a fidelity indicating how well the 
hierarchical summary element information is represented by a hierarchical summary 
element information in a higher level, Hirai teaches, in column 5, line 6, and a level of 
degrees of abstraction as associated with frame images. 
25. With regard to claim 43, which teaches the key image locator including a 
synthetic key frame locator, Hirai teaches, in column 4, lines 52-55, a selecting means 
for extracting a still image (M-icon) representing each of the scenes or cuts. Hirai furher 
teaches, in column 10, lines 40-45, the creation of a new M-icon from two existing M- 
icons (showing that an M-icon can be synthetic). 
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26. With regard to claim 44, which teaches the synthetic key frame being an image 
that is not in the source multimedia content, Hirai teaches, in column 10, lines 40-45, 
the creation of a new M-icon from two existing M-icons. 

27. Claims 28 and 29 are rejected under 35 U.S.C. 102(e) as being anticipated by 
Ratakonda, Patent # 5,956,026. 

28. With regard to claim 28, which teaches dividing video into a plurality of sections 
where each section includes a plurality of frames, and synthesizing a key region 
representing content of each section into one image, to generate a synthetic key frame, 
each selected key region corresponding to a portion of a frame smaller than the total 
frame size, Ratakonda teaches in column 2, lines 13-27, column 4, lines 35-63, and in 
column 9, lines 40-43, generating a summary of a video based on key frames by 
detecting shot boundaries to determine regions, and locating representative shots of the 
region, where the representative shots (key frames) can be clusters of other shots. 
Figure 5, further shows the hierarchical structure where the levels of key frames are 
depicted each level going down representing a finer level of the video space. With 
regard to claim 28, further teaching providing a user interface to a predetermined 
display to browse a video related to the generated synthetic key frame, Ratakonda 
teaches, in column 13, line 35, providing a user interface. With regard to claim 28, 
further teaching selecting the synthetic key frame according to an input by a user, and 
reproducing a segment represented by the selected synthetic key frame, Ratakonda 
teaches, in column 5, lines 56-63, clicking on a particular frame and being able to 
display a normal playback of a video sequence. 
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29. With regard to claim 29, which teaches a reproducing step that reproduces a 
segment related with constituent elements of the contents of the key frame, Ratakonda 
teaches in column 5, lines 56-63, clicking on a particular frame and being able to display 
a normal playback of a video sequence. 

Claim Rejections - 35 USC § 103 

30. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 

obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

31. Claims 19, 40, and 41 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Hirai and Ratakonda. Hirai teaches a method of generating 
representative still images from a video sequence, a locating means comprising: a 
means of calculating a fidelity value (see column 5, line 6), a annotation (see column 

1 1 , lines 9-1 5), a list of related segments (see column 1 1 , lines 9-1 5), and information 
on arrangement (see column 1 1 , lines 9-1 5), but doesn't teach elements in a key frame 
list containing locators, the locator including an inherent ID, and image locator to locate 
the image, a representative segment locator, or a key image locator being a structure 
for designating an image using, a key image locator, a key frame locator, a s-key frame 
locator. Ratakonda teaches a video summarization system similar to that of Hirai, but 
further teaches elements in a key frame list containing locators, the locator including an 
inherent ID, and image locator to locate the image, a representative segment locator, 
and a key image locator being a structure for designating an image using a, key image 
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locator, a key frame locator, a s-key frame locator. With regard to claim 8, Ratakonda 
teaches, in column 6, lines 45-67, a means for locating items in a key frame list. It 
would have been obvious to one of ordinary skill in the art, having the teachings of Hirai 
and Ratakonda before him at the time the invention was made to modify the method of 
generating representative still images of a video sequence of Hirai to include the 
representative image locating means of Ratakonda. One would have been motivated to 
make such a combination because locating these key frames and the video segment 
they represent can help in editing and/or viewing the video sequence. 

32. With regard to claim 19, which teaches a key image locator being a structure for 
designating an image using: a key image locator, a key frame locator, and a s-key 
frame locator, Ratakonda teaches, in column 6, lines 45-67, a locator method for 
locating key images of all levels of the hierarchy, and all the frames which they 
represent. It would have been obvious to one of ordinary skill in the art, having the 
teachings of Hirai and Ratakonda before him at the time the invention was made to 
modify the method of generating still images of Hirai to include a locator method for 
locating key images of all levels of the hierarchy, and all the frames which they 
represent of Ratakonda. One would have been motivated to make such a combination 
because the inclusion of these location and identification elements will help the user in 
using the video software. 

33. With regard to claim 40, which teaches the synthetic key frame including a 
selected key region from each of the plurality of sections, Hirai further teaches, in 
column 4, lines 45-64, a group of frames combined into a representative frame. 
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34. With regard to claim 41 , which teaches each of the plurality of sections 
comprising a video frame, and the selected key region comprises a portion of the video 
frame, Hirai further teaches, in column 4, lines 45-64, a group of frames combined into a 
representative frame and in column 3, lines 1-4, the association between the pictures in 
the hierarchal structure and their associated moving 

picture sequences. 

Response to Arguments 

35. The arguments filed on 2-01-05 have been fully considered but they are not 
persuasive. The reasons are set forth below. 

36. With respect to the applicants argument, that Hirai does not teach or suggest the 
key region and synthetic key frame as recited in claim 1 . 

37. In response, the examiner respectfully submits that Hirai teaches, in column 15, 
lines 35-47, column 9, lines 5-15, and in column 10, lines 40-45, combining two M- 
icons, each representative of a region, to form a combined (synthetic) M-icon. Hirai 
teaches, in column 2, lines 43-61 and in column 4, lines 45-60, an M-icon representing 
a scene or cut (representative portion) of a video sequence (key region), and from this 
representative portion, a second, finer, representative image is created. These images 
are placed in a hierarchical structure, with the top broad image representing the entire 
story, and the bottom images representing a particular finer element of the entire story. 

38. With respect to the applicants argument, that Hirai does not teach or suggest 
each selected key region corresponding to a portion of a frame smaller than a total size. 
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39. In response, the examiner respectfully submits that Hirai teaches, in column 4, 
lines 45-52, having a moving picture and dividing the moving picture up into scenes and 
cuts, where If an item is divided up into portions these portions must be smaller than the 
whole. 

40. With respect to the applicants argument, that M-icons are not key regions from a 
video stream. 

41 . In response, the examiner respectfully submits that Hirai shows in column 2, 
lines 43-53, that an M-icon can represent a cut, a scene, or a plurality of scenes. 
Where the detecting means divides the video up into scenes and cuts by detecting 
change points (see column 4, lines 46-60). 

42. With respect to the applicants argument, that Hirai does not suggest assigning a 
synthetic key frame to a key image locator, a hierarchical summary list for describing 
lower summary structures, and a structural information of the video stream. 

43. In response, the examiner respectfully submits that Harai teaches, in column 10 
line 56 through column 1 1 , lines 9-15 and in column 1 2, lines 59-67, The icon containing 
information used to locate its location within the moving picture, and information about 
its layout value in the hierarchical structure, where this layout information is a number 
up from the bottom layer giving structural information. 

44. With respect to the applicants argument, that Ratakonda does not teach or 
suggest a key region or a synthetic key frame. 

45. In response, the examiner respectfully submits that Ratakonda teaches, in 
column 4, line 35-63, the partitioning of video data in to shots (sequences) boundaries 
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for key frame selection. Ratakonda further teaches, in column 9, lines 40-43, that key 
frames can be grouped to form representative key frames. This produces a new key 
frame that represents a plurality of other key-frames. 

46. With respect to the applicants argument, that claim 19, distinguishes over the 
applied references. 

47. In response, the examiner respectfully submits that Ratakonda teaches, in 
column 6, lines 45-67, a locator method for locating key images of all levels of the 
hierarchy, and all the frames which they represent. 

Conclusion 

48. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Dennis G. Bonshock whose telephone number is (571) 
272-4047. The examiner can normally be reached on Monday - Friday, 6:30 a.m. - 4:00 
p.m. 

49. If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, John Cabeca can be reached on (571) 272-4048. The fax phone number 
for the organization where this application or proceeding is assigned is 703-872-9306. 
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50. Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). 

3-11-05 
dgb 



SUPERVISORY PATENT EXAMIN 5 
TECHNOLOGY CENTER 21 Of 




